Distribution-to-distribution (D2D) point cloud registration techniques such as the Normal Distributions Transform (NDT) can align point clouds sampled from unstructured scenes and provide accurate bounds of their own solution error covariance-- an important feature for safety-of life navigation tasks. D2D methods rely on the assumption of a static scene and are therefore susceptible to bias from range-shadowing, self-occlusion, moving objects, and distortion artifacts as the recording device moves between frames. Deep Learning-based approaches can achieve higher accuracy in dynamic scenes by relaxing these constraints, however, DNNs produce uninterpratable solutions which can be problematic from a safety perspective. In this paper, we propose a method of down-sampling LIDAR point clouds to exclude voxels that violate the assumption of a static scene and introduce error to the D2D scan matching process. Our approach uses a solution consistency filter, identifying and flagging voxels where D2D contributions disagree with local estimates from a PointNet-based registration network.
translated by 谷歌翻译
在本文中,我们提出了一种通过基于球形网格的预处理步骤来减轻激光扫描匹配中阴影错误的方法。由于网格与LiDAR束对齐,因此消除阴影边缘相对容易,从而导致LiDAR扫描匹配的系统错误。正如我们通过仿真所示,我们提出的算法比地面平面去除算法是最常见的减轻阴影策略。与拆除地面平面不同,我们的方法适用于任意地形(例如,城市墙壁上的阴影,丘陵地形的阴影),同时将钥匙雷达点保留在地面上,这对于估计高度,音高和滚动的变化至关重要。我们的预处理算法可以与一系列扫描匹配方法一起使用。但是,对于基于体素的扫描匹配方法,它通过降低计算成本和在体素之间更均匀分配激光点来提供额外的好处。
translated by 谷歌翻译
LIDAR数据可用于生成点云,用于导航自动驾驶汽车或移动机器人平台。扫描匹配是估计最能使两个点云的刚性转换的过程,是LiDAR探射仪的基础,这是一种死亡估算的形式。当没有GPS(例如GPS)(例如GPS)(例如GPS)时,LIDAR的探光仪特别有用。在这里,我们提出了迭代最接近的椭圆形变换(ICET),这是一种扫描匹配算法,可对当前最新的正常分布变换(NDT)进行两种新颖的改进。像NDT一样,ICET将激光雷达数据分解为体素,并将高斯分布拟合到每个体素内的点。 ICET的第一个创新通过沿着这些方向抑制溶液来降低沿着大型平坦表面的几何歧义。 ICET的第二个创新是推断与连续点云之间的位置和方向转换相关的输出误差协方差;当将ICET纳入诸如扩展的卡尔曼滤波器之类的状态估计例程中时,误差协方差特别有用。我们构建了一个模拟,以比较有或没有几何歧义的2D空间中ICET和NDT的性能,并发现ICET产生了出色的估计值,同时可以准确预测溶液的准确性。
translated by 谷歌翻译
While inferring common actor states (such as position or velocity) is an important and well-explored task of the perception system aboard a self-driving vehicle (SDV), it may not always provide sufficient information to the SDV. This is especially true in the case of active emergency vehicles (EVs), where light-based signals also need to be captured to provide a full context. We consider this problem and propose a sequential methodology for the detection of active EVs, using an off-the-shelf CNN model operating at a frame level and a downstream smoother that accounts for the temporal aspect of flashing EV lights. We also explore model improvements through data augmentation and training with additional hard samples.
translated by 谷歌翻译
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.
translated by 谷歌翻译
A canonical algorithm for log-concave sampling is the Langevin Algorithm, aka the Langevin Diffusion run with some discretization stepsize $\eta > 0$. This discretization leads the Langevin Algorithm to have a stationary distribution $\pi_{\eta}$ which differs from the stationary distribution $\pi$ of the Langevin Diffusion, and it is an important challenge to understand whether the well-known properties of $\pi$ extend to $\pi_{\eta}$. In particular, while concentration properties such as isoperimetry and rapidly decaying tails are classically known for $\pi$, the analogous properties for $\pi_{\eta}$ are open questions with direct algorithmic implications. This note provides a first step in this direction by establishing concentration results for $\pi_{\eta}$ that mirror classical results for $\pi$. Specifically, we show that for any nontrivial stepsize $\eta > 0$, $\pi_{\eta}$ is sub-exponential (respectively, sub-Gaussian) when the potential is convex (respectively, strongly convex). Moreover, the concentration bounds we show are essentially tight. Key to our analysis is the use of a rotation-invariant moment generating function (aka Bessel function) to study the stationary dynamics of the Langevin Algorithm. This technique may be of independent interest because it enables directly analyzing the discrete-time stationary distribution $\pi_{\eta}$ without going through the continuous-time stationary distribution $\pi$ as an intermediary.
translated by 谷歌翻译
We explore the use of large language models (LLMs) for zero-shot semantic parsing. Semantic parsing involves mapping natural language utterances to task-specific meaning representations. Language models are generally trained on the publicly available text and code and cannot be expected to directly generalize to domain-specific parsing tasks in a zero-shot setting. In this work, we propose ZEROTOP, a zero-shot task-oriented parsing method that decomposes a semantic parsing problem into a set of abstractive and extractive question-answering (QA) problems, enabling us to leverage the ability of LLMs to zero-shot answer reading comprehension questions. For each utterance, we prompt the LLM with questions corresponding to its top-level intent and a set of slots and use the LLM generations to construct the target meaning representation. We observe that current LLMs fail to detect unanswerable questions; and as a result, cannot handle questions corresponding to missing slots. To address this problem, we fine-tune a language model on public QA datasets using synthetic negative samples. Experimental results show that our QA-based decomposition paired with the fine-tuned LLM can correctly parse ~16% of utterances in the MTOP dataset without requiring any annotated data.
translated by 谷歌翻译
Task-oriented dialogue systems often assist users with personal or confidential matters. For this reason, the developers of such a system are generally prohibited from observing actual usage. So how can they know where the system is failing and needs more training data or new functionality? In this work, we study ways in which realistic user utterances can be generated synthetically, to help increase the linguistic and functional coverage of the system, without compromising the privacy of actual users. To this end, we propose a two-stage Differentially Private (DP) generation method which first generates latent semantic parses, and then generates utterances based on the parses. Our proposed approach improves MAUVE by 3.8$\times$ and parse tree node-type overlap by 1.4$\times$ relative to current approaches for private synthetic data generation, improving both on fluency and semantic coverage. We further validate our approach on a realistic domain adaptation task of adding new functionality from private user data to a semantic parser, and show gains of 1.3$\times$ on its accuracy with the new feature.
translated by 谷歌翻译
Language modeling, a central task in natural language processing, involves estimating a probability distribution over strings. In most cases, the estimated distribution sums to 1 over all finite strings. However, in some pathological cases, probability mass can ``leak'' onto the set of infinite sequences. In order to characterize the notion of leakage more precisely, this paper offers a measure-theoretic treatment of language modeling. We prove that many popular language model families are in fact tight, meaning that they will not leak in this sense. We also generalize characterizations of tightness proposed in previous works.
translated by 谷歌翻译
From smoothly pursuing moving objects to rapidly shifting gazes during visual search, humans employ a wide variety of eye movement strategies in different contexts. While eye movements provide a rich window into mental processes, building generative models of eye movements is notoriously difficult, and to date the computational objectives guiding eye movements remain largely a mystery. In this work, we tackled these problems in the context of a canonical spatial planning task, maze-solving. We collected eye movement data from human subjects and built deep generative models of eye movements using a novel differentiable architecture for gaze fixations and gaze shifts. We found that human eye movements are best predicted by a model that is optimized not to perform the task as efficiently as possible but instead to run an internal simulation of an object traversing the maze. This not only provides a generative model of eye movements in this task but also suggests a computational theory for how humans solve the task, namely that humans use mental simulation.
translated by 谷歌翻译